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^ : Abstract 
O 

\ DNA looping plays a fundamental role in a wide variety of biological processes, providing the 

' backbone for long range interactions on DNA. Here we develop the first model for DNA looping 

by an arbitrarily large number of proteins and solve it analytically in the case of identical bind- 
ing. We uncover a switch-like transition between looped and unlooped phases and identify the key 
parameters that control this transition. Our results establish the basis for the quantitative under- 
standing of fundamental cellular processes like DNA recombination, gene silencing, and telomere 
maintenance. 
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PACS numbers: 87.14.Gg, 87.15.He, 05.50.+q, 87.80.Vt 



1 



The formation of DNA loops by the binding of proteins and protein coniplexes at distal 



DNA sites plays a fundamental role in many cellular processes 



transcription 



.combination B,.eplica«„„Q, and te>on,e.e.ar„te„a„ce 



including 
8|. Disruption 



or alteration of these processes often results in different developmental disorders and disease 
states, with cancer the most prominent example ^J. The key role of looping is to bypass 
the one dimensional nature of DNA and allow distal DNA sites to come close to each other. 
In gene regulation, proteins bound far away from the genes they regulate can be brought 
to the initiation of transcription region of the regulated genes by looping the intervening 
DNA Q]. Similarly, in DNA recombination, loops are formed that bring together two DNA 
regions to transfer the genetic information from one DNA region to another. Although 
there are studies of double-stranded DNA looping by DNA itself (cyclization) [10], by one 
12 1 , or by a few proteins Q , a general understanding of the collective properties 



protein 
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that might emerge when multiple proteins are involved is still lacking. The case of multiple 
proteins is specially irnportant because it is the dominant one for loops larger than a few 
hundred base pairs |l|, Q ■ 

In this letter we develop the first model for DNA looping by an arbitrary number of 
proteins. For small number of proteins, this model accounts for previous thermodynamic 
approaches that have been shown to reproduce in detail available experimental data on 
regulation of the lac operon and phage-A [l|. For large number of proteins, we show here that 
the model exhibits properties reminiscent of phase transitions with a quasi-discontinuity 
in the occupancy of the DNA sites by DNA binding proteins. We identify the parameters 
that control the transition and show that there are two phases that can be associated with 
looped and unlooped states of DNA. The density of proteins on DNA is low for the unlooped 
state and high for the looped state. Despite the apparent one dimensional physical nature 
of the problem, looping of DNA introduces long range interactions which make the system 
exhibit unexpected collective features. 

We consider a system with two spatially distinct DNA regions on the same DNA double- 
strand, referred to as upstream {(J) and downstream (D) operators (Figure Q). Each oper- 
ator has binding sites for proteins that once bound to one of them can interact with its 
symmetric counterpart on the other operator if DNA is looped. The typical way to obtain 
the statistical properties of the system is to identify the representative states and their cor- 
responding free energies and to compute the partition function jll| . This process is usually 
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done by tabulating the free energies and explicitly writing down the sums of Boltzmann 
factors for all the states. For large systems, however, this procedure is not practical because 
of the exponential growth of the potential number of states (e.g., for N=3, there are already 
128 states). 

The facts that the free energy of a state can be decomposed into different contributions 
and that the states can be labelled by discrete variables Q| allow for a Hamiltonian 
description of the system. Here, we describe the binding of proteins to DNA through binary 
variables au,i and <JD^i, which indicate whether (= 1) or not (= 0) a protein is bound to 
site i at the upstream or downstream operator, respectively. Similarly, an additional binary 
variable cr^ indicates whether DNA is looped (= 1) or not (= 0). In terms of this set of 
binary variables the system is described by the following Hamiltonian: 

N N 

H = {c + eJ2 (^u,i(^D,i)(^L + gJ2(^u,i + (^D,i) ) (1) 

i=l i=l 

where g is the change in free energy upon binding of a protein to a DNA site; e is the free 
energy of interaction between proteins symmetrically bound at opposite operators; and c 



15l |. Therefore, the free energy of each of 



is the free energy of forming the DNA loop jl, 
the 2^^ looped and 2^^ unlooped states is obtained directly from the previous Hamiltonian. 
The dependence of the Hamiltonian on the concentration of binding proteins n enters, in 
the usual form, through the quantity g, which can be viewed as a chemical potential: g = 
9" — ^ lii^, where g° denotes the value oi g at a protein concentration of 1 M and = RT 
(the gas constant times the absolute temperature). This type of Hamiltonians account for 
thermodynamic models that have recently been shown to accurately describe gene regulation 
in the lac operon by the lac repressor (A^ = 1) and in phage-A by the CI2 repressor (A^ = 3) 
A systematic analysis for large systems, however, is still missing. 
In order to compute the partition function, it is convenient to rewrite the Hamiltonian 
as the sum of quasi-independent single-pair Hamiltonians: 

N 

H = Y.Hp,, (2) 

1=1 

where 

Hp^i = aiic/N + eau,iOD,i) + g{(yu,i + cr^.i) • (3) 
The coupling of single-pair Hamiltonians is established through the three-body terms 
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eau,iO'D,iO'L, which account for the interactions between DNA looping and DNA-bound pro- 
teins. 

The quasi-independence property allows us to express the partition function as 

N 

Z= I[Zp,, (4) 

<TL={0,l}i = l 



with 



Zp, = E e-^^-^' (5) 

<^I5,i = {0,l} 



which leads to 



e — (l + e'") (6) 



The two properties of interest are the looping probability and the occupancy of the sites, 
which follow straightforwardly from the previous expression of the partition function. The 
probability of the looped state is given by the average value of 0"^, (ai) = — ^^InZ. After 
taking the logarithmic derivative and performing algebraic manipulations, we obtain 

(^^) = YTY^ ' 

with 

V _ V / (o\ 

This expression for (ai) indicates that, for large A^, there is the potential for a sharp tran- 
sition between two states: the loop is always present if X < 1 and absent if X > 1. 

This discontinuity can also propagate to the probability for a site to be occupied, given by 
1 a 



<^u/D,i) = ~2W^'Bg^^ Z 5 which is related to the looping probability through 

Under physiological conditions, the parameter typically used by the cell to control DNA 
looping is the protein concentration. Figure |21 shows the system behavior as a function of 



the protein concentration and the number of binding sites for representative values of the 
parameters Q|. The figure illustrates the presence of looped and unlooped phases (Figure 
I2t')- Only for intermediate concentrations the occupancy of the sites (Figure |2b) displays 
a discontinuous behavior. For concentrations in the high and low extremes, DNA looping 
does not substantially affect the binding of proteins. 

The concentration n at which the transition happens (X = 1) is given by 

6-/3 (et /e-/3 (1 _ e-/3) (ef-l) 

This equation has a positive solution if and only ii e < —c/N . 11 e > —c/N there is no 
positive solution and the sites become occupied as the concentration increases without the 
system ever reaching the looped state (Figure EI). Therefore, the inter-operator protein 
interactions need to exceed a strength threshold in order for DNA looping to have the 
potential to be present. Remarkably, this threshold goes to zero as the number of binding 
sites increases. This constraint correlates with the general trend that the number of proteins 
used to tie the DNA loop increases with the length of the loop ^. A longer loop typically 
implies a higher free energy of looping, c, which in turn requires a stronger interaction 
between proteins (a more negative e) or a higher number of sites in order for the system to 
switch to the looped state. 

A remarkable property inferred from the previous equations is that the looping free energy 
and the number of binding sites affect the concentration at which the transition occurs only 
through the ratio c/N . If this ratio is kept constant, coordinated changes in c and N modify 
the sharpness of the transition but not the concentration at which it happens (Figure 
The main trends observed in the looping behavior with respect to c and N are also observed 
in the occupancy of the sites (Equation IHl), which depends on c and N only through the 
looping probability. 

In the case of large N , by expanding in terms of the dimensionless parameter (3c/ N , the 
previous equation simphfies to 



(11) 



N {e-^P - 1) ' 

which indicates that the concentration at which the transition happens decreases asymptot- 
ically like X~^/^ as the number of binding sites increases (as demonstrated in Figure |2fe')- 
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Therefore, by increasing the number of binding sites, the system can reach the looped phase 
at arbitrarily small protein concentrations. 

This asymptotic equation indicates that, for large A^, changes of e and that keep 
N{e~'^^ — 1) constant do not affect the transition point. Because of the strong dependence 
of the occupancy on e for looped phases (Equation E)), coordinated changes of e and can 
keep the looping properties while strongly affecting the occupancy of the sites (Figure EJ. 

In conclusion, we have developed the first model for DNA looping by an arbitrary number 
of proteins and found that, for large number of binding sites, the system exhibits a phase- 
transition-like behavior with two phases in which DNA is either looped or unlooped. Many 
cellular processes rely on the existence a looped phase to work (e.g. telomere maintenance), 
others on the occupancies of the sites that comes with the looped phase (e.g. gene regu- 
lation), and others on the transition from one phase to another to trigger its effects (e.g. 
DNA recombination). Our results indicate that DNA looping by multiple proteins has a 
high versatility to achieve different behaviors. Explicitly, the system can reach the looped 
phase at arbitrarily small protein concentrations, the sharpness of the transition can easily 
be tuned, and the system can choose the degree to which switching to the looped state 
affects occupancy of the DNA binding sites. This versatility underlies the many facets of 
DNA looping across the spectrum of biological processes where it is at play. 

The model we have proposed and its potential extensions encompass a broad range of 
biological processes. The case of identical binding we have discussed here in detail closely 
approximates DNA looping in DNA recombination and telomere maintenance [3, . Both 
of these processes play a fundamental role in the functioning of the cell and their deregu- 
lation is responsible for a variety of diseases, including different types of cancer j^. Our 
model provides a backbone to build upon and to tackle more complex situations, involv- 
ing for instance non-identical binding, multiple loops, and intra-operator interactions jirj l. 
From a methodological point of view, our approach provides a full Hamiltonian formulation 
of DNA looping that opens the applicability of the techniques of statistical physics, both 
computational and analytical, to a new range of biological problems of basic and medical 
importance. 
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where the binding free energy gu/D,i depends on the site, there are M different types of DNA 
loops with potentially different free energies Ck, and the interaction free energy e^j-^fc between 




N 

+ '^{gU,iO'U,i +9D,i(^D,i) 



1=1 
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proteins bound at opposite operators depends of the type of loop and the particular binding 
sites. 
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Figure 1: Schematic representation of looped DNA. Proteins (filled circles) bind to DNA (black 
line) at specific sites (rectangles on the line). Proteins bound at one operator, upstream {U) or 
downstream (D), can interact with their counterparts at the opposite operator if DNA forms a 
loop {L). In this example, the number of binding sites per operator is N . The binary variables 
au,i and cjd,* are 1 when proteins are bound to the corresponding DNA site and are otherwise. 
Here, only the two proteins bound at sites i = on the upstream ( U,N) and downstream {D,N) 
operator interact with each other. 
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Figure 2: Looping probability Pioop (a) and site occupancy Pbound (b) as functions of the protein 
concentration n (in nM) and the number of binding sites per operator N. The values of the 
parameters are (3~^ = 0.6 kcal/mol, g° = —7.2 kcal/mol, c = 30 kcal/mol, e = —5.5 kcal/mol. 
Black and white colors in the 2-D density plot projections of the 3-D surfaces represent probabilities 
and 1, respectively. The green line corresponds to h (given by Equation llOj) and indicates the 
separation between looped and unlooped phases (regions with concentrations above and below the 
line, respectively). Note that there is no looped phase for < — c/e = 5.45. 
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Figure 3: Looping probability Pioop and site occupancy Pbound as functions of the protein concen- 
tration n (in nM) for coordinated changes of the free energy of looping and number of binding sites. 
The values of the parameters are = 0.6 kcal/mol, g° = —7.2 kcal/mol, e = —7.5 kcal/mol, 
c = O.IA^ kcal/mol, N = 10 (top), N = 100 (middle), and N = 1000 (bottom). 
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Figure 4: Looping probability Pioop and site occupancy Pbound as functions of the protein concen- 
tration n (in nM) for coordinated changes in the interaction free energy and number of binding 
sites. The values of the parameters are = 0.6 kcal/mol, g° = —7.2 kcal/mol, c = 30 kcal/mol, 
e = -10.96 kcal/mol + IniV, = 100 (top), N = 1000 (middle), and N = 10000 (bottom). 
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